bench: reset stateful benchmarks between timed calls by l0rinc · Pull Request #130 · l0rinc/bitcoin

l0rinc · 2026-04-12T14:47:07Z

This stack fixes several bench_bitcoin benchmarks that carried mutable state across timed calls.
Later timed calls could start from meaningfully different state than the first one and end up measuring different work.

To make that easier to review, the stack first adds commented reviewer checks.
Those checks can be uncommented locally on the pre-fix code to show that the second timed call no longer starts from the same state as the first.
The follow-up commits then keep those checks in the same form and make them pass.

This stack:

drops one duplicate wallet balance benchmark
fixes the ephemeral spend benchmark input construction bug
guards nanobench::setup() against use with multi-iteration epochs
adds reviewer checks for the stateful benchmarks
resets the stateful ChaCha, BenchLockedPool, pool allocator, WriteBlockBench, and rolling bloom benchmarks between timed calls

This stack intentionally does not try to remove every small source of variation.
It only fixes benchmarks where the carried state changed the work in a meaningful way.

`WalletBalanceMine` duplicated `WalletBalanceClean` exactly. Both registrations called `WalletBalance(bench, /*set_dirty=*/false, /*add_mine=*/true)`. No runtime reproducer is needed here because the duplicate is visible in the registration code itself. Remove the duplicate registration so the balance benchmark list stays distinct.

`MempoolCheckEphemeralSpends` only filled `tx2.vin[0]` in a loop. That left the rest of the inputs with default prevouts and built the wrong package shape. Write each prevout to `vin[i]` instead and assert that the last child input spends the last parent output.

`setup()` in nanobench runs once per epoch, while an epoch can still execute the benchmark body multiple times. With the default multi-epoch benchmarking, that makes it easy to mistake per-sample setup for per-iteration setup and silently benchmark changing state inside one sample. Fail fast when `setup()` is combined with anything other than `epochIterations(1)`.

Add commented-out assertions ahead of the fixes in this stack. Each assertion can be uncommented on the pre-fix code to confirm the benchmark does not start each timed call from the same state. The follow-up commits uncomment these checks and make them pass.

`CHACHA20` and `FSCHACHA20POLY1305` kept one cipher object alive across timed calls. That advanced the stream position, and `FSChaCha20Poly1305` eventually crossed a rekey boundary. The benchmark now keeps the same reviewer check in place. A fresh zero-key `CHACHA20` stream starts with `0x76`, and a fresh zero-key `FSChaCha20Poly1305` encryption starts with `0x9f`. Those assertions fail on the old code on the second timed call. Rebuild the cipher in `setup()` for each timed call so the measured work starts from the same state.

`BenchLockedPool` carried allocator state across timed calls. That changed the work performed after the first call. The benchmark keeps the same reviewer check in place. The initial seed must still be `INITIAL_STATE`, and that assertion fails on the old code on the second timed call. Rebuild the arena in `setup()` so each timed call starts from the same allocator state.

The pool map benchmarks cleared their maps inside the timed call but reused the grown bucket state. That changed the insertion preconditions for later iterations. The benchmark keeps the same reviewer check in place. It records the reset bucket count and asserts that each timed call starts from that same bucket layout. That fails on the old code on the second timed call. Reset the maps in `setup()` with `rehash(0)` so each timed call starts from the same empty bucket layout.

`WriteBlockBench` kept appending to one block manager across timed calls. That changed the write position for later iterations. The benchmark keeps the same reviewer check in place. A fresh block manager should write to `FlatFilePos{0, STORAGE_HEADER_BYTES}`, and that assertion fails on the old code on the second timed call. Rebuild the testing setup in `setup()` so each timed call writes into the same fresh block storage state.

`RollingBloom` kept one filter alive across timed calls. That changed the filter state and the later work it performed. The benchmark keeps the same reviewer check in place. A fresh timed call should not already contain the first inserted value, and that assertion fails on the old code on the second timed call. Reset the filter in `setup()` so each timed call starts from the same empty state.

l0rinc added 9 commits April 12, 2026 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: reset stateful benchmarks between timed calls#130

bench: reset stateful benchmarks between timed calls#130
l0rinc wants to merge 9 commits intomasterfrom
l0rinc/reset-stateful-benchmarks

l0rinc commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l0rinc commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant